Deterministic execution of multithreaded applications for reliability of multicore systems
نویسنده
چکیده
With the advent of modern nano-scale technology, it has become possible to implement multiple processing cores on a single die. The shrinking transistor sizes however have made reliability a concern for such systems as smaller transistors are more prone to permanent as well as transient faults. To reduce the probability of failures of such systems, online fault tolerance techniques can be applied. These techniques need to be efficient as they execute concurrently with applications running on such systems. This paper discusses the challenges involved in online fault tolerance and existing work which tackles these challenges. We classify fault tolerance into four different steps which are proactive fault management, error detection, fault diagnosis and recovery and discuss related work for each step, with focus on techniques for shared memory multicore/multiprocessor systems. We also highlight the additional difficulties in tolerating faults for parallel execution on shared memory multicore/multiprocessor systems.
منابع مشابه
The Flight Data Recorder Continually Logs Memory Races in a Multithreaded Execution, Enabling the Deterministic Replay Invaluable for Debugging
......As hardware vendors transition to multicore chips, software vendors face increased software reliability challenges. To effectively debug software in this new world, developers must be able to replay executions that exhibit a bug so that they can zero in on concurrency bugs—especially intermittent ones. Such deterministic replay also aids fault detection and recovery, intrusion detection, ...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملDTHREADS: Efficient and Deterministic Multithreading
Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors in multithreaded applications. One way to simplify multithreaded programming is to enforce deterministic execution. However, past deterministic systems are incomplete or impractical. Language-based approaches require programmers to write...
متن کاملLow-cost Hardware Fault Detection and Diagnosis for Multicore Systems
Continued technology scaling is resulting in systems with billions of devices. Consequently, these devices are are prone to failures from various sources resulting in a growing reliability threat. As this reliability problem is expected to affect the broad computing market, traditional solutions involving high redundancy, or piecemeal solutions targeting specific failure modes will no longer be...
متن کاملProgram Execution on Reconfigurable Multicore Architectures
Based on the two observations that diverse applications perform better on different multicore architectures, and that different phases of an application may have vastly different resource requirements, Pal et al. proposed a novel reconfigurable hardware approach for executing multithreaded programs. Instead of mapping a concurrent program to a fixed architecture, the architecture adaptively rec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015